Skip to content

Refactor jsx mode in parser #7751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

nojaf
Copy link
Member

@nojaf nojaf commented Aug 2, 2025

I was experimenting with the parser related to JSX and noticed that we have a somewhat convoluted mechanism for handling /> and - in identifier names.

First, I created a token dump tool in res_parser, which was previously missing.

Currently, the parser employs a sequence of Scanner.set_jsx_mode p.Parser.scanner; and Scanner.pop_mode p.scanner Jsx; while processing elements to distinguish between parser JSX and non-JSX. However, this is mainly used in the following cases:

  • Allowing a - inside an identifier. This logic belongs in the parser, but it currently resides in the scanner, which feels inappropriate.
  • Combining a < + / into a </ token. I would prefer using a lookahead for when a < is encountered. This would clarify that it's specific to JSX parsing. Even though LessThanSlash exists, there is still a separate LessThan + Slash check, which makes the code a bit messy.

This is an effort to streamline the JSX mode.

PS: to run the local analysis tests, I had to revert to legacy clean. This is for the better until we figure out #7707

| LessThan ->
(* Imagine: <div> <Navbar /> <
* is `<` the start of a jsx-child? <div …
* or is it the start of a closing tag? </div>
* reconsiderLessThan peeks at the next token and
* determines the correct token to disambiguate *)
let token = Scanner.reconsider_less_than p.scanner in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what bother me a bit, there is LessThanSlash above, yet we still need to do the reconsider_less_than call.

let attr_expr = parse_primary_expr ~operand:(parse_atomic_expr p) p in
Some (Parsetree.JSXPropValue ({txt = name; loc}, optional, attr_expr))
| _ -> Some (Parsetree.JSXPropPunning (false, {txt = name; loc})))
(* {...props} *)
| Lbrace -> (
Scanner.pop_mode p.scanner Jsx;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather confusing when you are in a nested jsx scenario:

<div>
  <p>
    {foo}
  </p>
</div>

Popping Jsx from p requires the pop of div also to happen to get out of Jsx mode.

posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->32:10]
JSX <di:[30:10->30:12] div[32:6->32:9]=...[32:6->32:9]> _children:None
posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->30:12]
JSX <di:[30:10->30:12] > _children:None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes is because there is slightly different AST for:

<div>
  <di
</div>

It used to be

[
  structure_item (A.res[1,0+0]..[3,12+6])
    Pstr_eval
    expression (A.res[1,0+0]..[3,12+6])
      Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4])
      jsx_props =
        []
      > [1,0+4]
      jsx_children =
        [
          expression (A.res[2,6+2]..[3,12+6])
            Pexp_jsx_container_element "di" (A.res[2,6+3]..[2,6+5])
            jsx_props =
              [
                div              ]
            > [3,12+5]
            jsx_children =
              []
        ]
]

and now is

[
  structure_item (A.res[1,0+0]..[3,12+6])
    Pstr_eval
    expression (A.res[1,0+0]..[3,12+6])
      Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4])
      jsx_props =
        []
      > [1,0+4]
      jsx_children =
        [
          expression (A.res[2,6+2]..[2,6+5])
            Pexp_jsx_unary_element "di" (A.res[2,6+3]..[2,6+5])
            jsx_props =
              []
        ]
]

I think this is more correct. The unary (new) versus container (old) doesn't matter that much, neither can be determined for <di.
However, the container had a weird prop div, which it no longer has.

Copy link

pkg-pr-new bot commented Aug 2, 2025

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@7751

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@7751

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@7751

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@7751

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@7751

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@7751

commit: 452301a

@nojaf nojaf marked this pull request as ready for review August 2, 2025 12:00
@nojaf nojaf requested a review from cristianoc August 2, 2025 12:00
@nojaf nojaf changed the title Add token dump printer Refactor jsx mode in parser Aug 2, 2025
@nojaf
Copy link
Member Author

nojaf commented Aug 4, 2025

Hi @cristianoc, sorry for my eagerness, could you take a look at these changes?

@cristianoc
Copy link
Collaborator

Are there differences in white space behavior? Are these intended? (For composite tokens)? Or possible ambiguities when single characters are tokenised in isolation.
Traveling so could not take a look in detail, but these are the things that come to mind.

@nojaf
Copy link
Member Author

nojaf commented Aug 4, 2025

Or possible ambiguities when single characters are tokenised in isolation.

No, actually not, there is no other way the language can encounter </ besides JSX.
So that made me wonder if we needed the jsx mode in the first place.

Safe travels, will ask someone else for a review.
(Don't be shy to take a look at this once you are back 😇)

@zth , @shulhi , @aspeddro any volunteers?

@nojaf nojaf requested review from Copilot and removed request for cristianoc August 4, 2025 18:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the JSX mode handling in the ReScript parser by removing the convoluted JSX mode mechanism from the scanner and moving JSX-specific logic to the parser. The refactoring streamlines JSX parsing by eliminating the need for Scanner.set_jsx_mode and Scanner.pop_mode calls throughout the codebase, and introduces lookahead functionality for better JSX token handling.

Key changes:

  • Removes JSX mode from the scanner and replaces it with parser-level JSX identifier handling
  • Introduces lookahead functions (peekMinus, peekSlash) for better JSX token disambiguation
  • Adds a token debugger tool for development purposes

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
compiler/syntax/src/res_scanner.ml Removes JSX mode handling and adds lookahead functions for minus and slash characters
compiler/syntax/src/res_core.ml Implements JSX identifier parsing in the parser with lookahead-based logic
compiler/syntax/src/res_token.ml Removes LessThanSlash token type
compiler/syntax/src/res_token_debugger.ml Adds new token debugging utility
tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt Updates expected error message
Various package.json files Reverts to legacy clean command for analysis tests

Comment on lines +740 to +747
p.token <- token
| Uident txt when Scanner.peekMinus p.scanner ->
let buffer = Buffer.create (String.length txt) in
Buffer.add_string buffer txt;
Parser.next p;
let name = visit buffer |> Buffer.contents in
let token = Token.Uident name in
p.token <- token
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Direct mutation of parser state (p.token <-) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.

Suggested change
p.token <- token
| Uident txt when Scanner.peekMinus p.scanner ->
let buffer = Buffer.create (String.length txt) in
Buffer.add_string buffer txt;
Parser.next p;
let name = visit buffer |> Buffer.contents in
let token = Token.Uident name in
p.token <- token
set_token p token
| Uident txt when Scanner.peekMinus p.scanner ->
let buffer = Buffer.create (String.length txt) in
Buffer.add_string buffer txt;
Parser.next p;
let name = visit buffer |> Buffer.contents in
let token = Token.Uident name in
set_token p token

Copilot uses AI. Check for mistakes.

Comment on lines +740 to +747
p.token <- token
| Uident txt when Scanner.peekMinus p.scanner ->
let buffer = Buffer.create (String.length txt) in
Buffer.add_string buffer txt;
Parser.next p;
let name = visit buffer |> Buffer.contents in
let token = Token.Uident name in
p.token <- token
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Direct mutation of parser state (p.token <-) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.

Suggested change
p.token <- token
| Uident txt when Scanner.peekMinus p.scanner ->
let buffer = Buffer.create (String.length txt) in
Buffer.add_string buffer txt;
Parser.next p;
let name = visit buffer |> Buffer.contents in
let token = Token.Uident name in
p.token <- token
set_token p token
| Uident txt when Scanner.peekMinus p.scanner ->
let buffer = Buffer.create (String.length txt) in
Buffer.add_string buffer txt;
Parser.next p;
let name = visit buffer |> Buffer.contents in
let token = Token.Uident name in
set_token p token

Copilot uses AI. Check for mistakes.

@cristianoc
Copy link
Collaborator

Hitting AI with AI

https://chatgpt.com/share/68933fb4-cef4-8011-a04c-a30808a82b89

Jump to the end for the relevant summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants